Domain Adaptation Through Phrase Generalization for Improved Statistical Machine Translation Quality
نویسندگان
چکیده
This paper presents a method for domain adaptation (incorporating out-of-domain data) through phrase generalization (learning/using phrase templates) in order to improve the Italian-English translation quality on the BTEC travel task. The process of phrase generalization is described, and its inclusion in the system resulted in noticeable, but only minor improvements because of alignment problems and noisy lexicon issues. Several enhancements to the process are proposed, which are expected to result in more significant gains.
منابع مشابه
NICT-2 Translation System for WAT2016: Applying Domain Adaptation to Phrase-based Statistical Machine Translation
This paper describes the NICT-2 translation system for the 3rd Workshop on Asian Translation. The proposed system employs a domain adaptation method based on feature augmentation. We regarded the Japan Patent Office Corpus as a mixture of four domain corpora and improved the translation quality of each domain. In addition, we incorporated language models constructed from Google n-grams as exter...
متن کاملTranslation Model Based Weighting for Phrase Extraction
Domain adaptation for statistical machine translation is the task of altering general models to improve performance on the test domain. In this work, we suggest several novel weighting schemes based on translation models for adapted phrase extraction. To calculate the weights, we first phrase align the general bilingual training data, then, using domain specific translation models, the aligned ...
متن کاملDetailed Analysis of different Strategies for Phrase Table Adaptation in SMT
This paper gives a detailed analysis of different approaches to adapt a statistical machine translation system towards a target domain using small amounts of parallel in-domain data. Therefore, we investigate the differences between the approaches addressing adaptation on the two main steps of building a translation model: The candidate selection and the phrase scoring. For the latter step we c...
متن کاملPhrase Training Based Adaptation for Statistical Machine Translation
We present a novel approach for translation model (TM) adaptation using phrase training. The proposed adaptation procedure is initialized with a standard general-domain TM, which is then used to perform phrase training on a smaller in-domain set. This way, we bias the probabilities of the general TM towards the in-domain distribution. Experimental results on two different lectures translation t...
متن کاملConnecting Phrase based Statistical Machine Translation Adaptation
Although more additional corpora are now available for Statistical Machine Translation (SMT), only the ones which belong to the same or similar domains with the original corpus can indeed enhance SMT performance directly. Most of the existing adaptation methods focus on sentence selection. In comparison, phrase is a smaller and more fine grained unit for data selection, therefore we propose a s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008